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Abstract — Regenerating codes enable trading off repair band- 
width for storage in distributed storage systems (DSS). Due 
to their distributed nature, these systems are intrinsically sus- 
ceptible to attacks, and they may be susceptible to multiple 
node failures. This paper analyzes storage systems that employ 
cooperative regenerating codes that are robust to (passive) 
eavesdroppers. The analysis is divided into two parts, studying 
both minimum bandwidth and minimum storage cooperative 
regenerating scenarios. First, the secrecy capacity of mini- 
mum bandwidth cooperative regenerating codes is characterized. 
Second, for minimum storage cooperative regenerating codes, 
a secure file size upper bound and achievability results are 
provided. These results establish the secrecy capacity for the 
minimum storage scenario for certain special cases. In all 
scenarios, the achievability results correspond to exact repair, and 
secure file size upper bounds are obtained using mincut analyses 
over a suitable secrecy graph representation of DSS. The main 
achievability argument is based on appropriate precoding of the 
data to eliminate the information at the eavesdropper. 

Index Terms — Coding for distributed storage systems, co- 
operative repair, minimum bandwidth cooperative regenerat- 
ing (MBCR) codes, minimum storage cooperative regenerating 
(MSCR) codes, security. 



I. Introduction 

Distributed storage systems (DSS) are designed to store 
data over a distributed network of nodes. DSS have become 
increasingly important given the growing volumes of data 
being generated, analyzed and archived today. OceanStore 
OQ, Google File System (GFS) 0, and TotalRecall are 
a few examples of storage systems employed today. Data to 
be stored is more than doubling every two years, and efficiency 
in storage and data recovery is particularly critical today. The 
coding schemes employed by DSS are designed to provide 
efficient storage while ensuring resilience against node failures 
in order to prevent the permanent loss of the data stored on 
the system. In a majority of existing literature, the analysis of 
DSS focuses primarily on isolated node failures. In our work, 
we study a more general scenario of DSS that can suffer from 
multiple simultaneous node failures. In addition to multiple 
node failures, DSS systems are also inherently susceptible to 
adversarial attacks, such as one from eavesdroppers aiming to 
gain access to the stored data. Therefore, a "good" DSS would 
meet desired security requirements while performing efficient 
repairs even in the case of multiple simultaneous node failures. 

In ID, Dimakis et al. present a class of regenerating 
codes, which efficiently trade-off per node storage and repair 
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bandwidth for single node repair. These codes are designed to 
possess a maximum distance separable (MDS) property, which 
is an "any k out of n " property wherein the entire data can 
be reconstructed by contacting to any k storage nodes out of 
n nodes. By utilizing a network coding approach, the notion 
of functional repair is developed in J4), where the original 
failed node may not be replicated exactly, but can be repaired 
such that it is functionally equivalent. On the other hand, exact 
repair requires that the regeneration process results in an exact 
replica of the data stored on the failed node. This is essential 
due to the ease of maintenance and other practical purposes 
such as maintaining a code in its systematic form. Exact repair 
may also prove to be advantageous compared to functional 
repair in the presence of eavesdroppers, as the latter scheme 
requires updating the coding coefficients, which in turn may 
leak additional information to eavesdroppers 0. The design 
of exact regenerating codes achieving one of the two ends of 
the trade off between storage and repair bandwidth has been 
recently investigated by researchers. In particular, Rashmi et 
al. J6) design codes that are optimal for all parameters at 
the minimum bandwidth regeneration (MBR) point. For the 
minimum storage regeneration (MSR) point, optimal codes 
are presented in multiple recent papers. (See ifTl- lfTOl and 
references therein.) 

As discussed before, DSS can also exhibit multiple si- 
multaneous node failures, and it is desirable that these be 
repaired simultaneously. It is not uncommon that multiple 
failures occur in DSS, especially for large-scale systems. In 
addition, some DSS administrators may choose to wait to 
initiate a repair process after a critical number of failures has 
occurred (say t of them), in order to render the entire process 
more efficient and less frequent. For example, TotalRecall |3| 
currently executes a node repair process only after a certain 
threshold on the number of failures is reached. In such multiple 
failure scenarios, each new node replacing a failed one can 
still contact d remaining (surviving) nodes to download data 
for the repair process. In addition, replacement nodes, after 
downloading data from surviving nodes, can also exchange 
data within themselves to complete the repair process. This 
repair process is referred to as cooperative repair in ATI , 
which present network coding techniques to implement such 
repairs. Cooperative repair is shown to be essential as it can 
help in lowering the total repair bandwidth compared to the 
t = 1 case. Flexibility of the choice on download nodes at 
repair nodes is analyzed in lfl"2l . Ifl3l . focusing on functional 
repair, shows that under the constraint n = d + 1, deliberately 
delaying repairs (and thus increasing i) does not result in gains 
in terms of MBR/MSR optimality. 03] and 04] utilize a cut- 
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set bound argument and derive the cooperative counterpart of 
the end points of the trade off region. These two points are 
named as the minimum bandwidth cooperative regenerating 
(MBCR) point and the minimum storage cooperative regener- 
ating (MSCR) point (See also Q3). The work in Q4| shows 
the existence of cooperative regenerating codes with optimal 
repair bandwidth. Explicit code constructions for exact repair 
on this setup are presented in |fl6l , for the MBCR point, and 
in ifTTl . for the MSCR point. These constructions are designed 
for the setting of d = k. (See also [ GO.) Interference alignment 
is used in |[T9l to construct scalar codes to operate at the 
MSCR point. (This construction is limited to the case k = 2 
with d > k, and does not generalize to k > 3 with d > k.) An 
explicit construction for the MBCR point, with the restriction 
that n = d + 1 for any t > 1, is presented in |20l . Finally, the 
reference |2~T1 presents designs of scalar codes for the MBCR 
point for all possible parameter values. Noting the significance 
of cooperative repair in DSS, regenerating codes that have 
resilience to eavesdropping attacks will have greater value if 
they also have efficient cooperative repair mechanisms. 

The security of systems can be understood in terms of their 
resilience to either (or both) active or passive attacks l22l . 
l23l . Active attacks include settings where the attacker mod- 
ifies existing packets or injecting new ones to the system, 
whereas passive attacks include eavesdroppers observing the 
information being stored/transmitted. For DSS, cryptographic 
approaches like private-key cryptography are often logistically 
prohibitive, as the secret key distribution between each pair 
of nodes and its renewal are highly challenging, especially 
for large-scale systems. In addition, most cryptographic ap- 
proaches are typically based on certain hardness results, which, 
if repudiated, could leave the system vulnerable to attacks. 
On the other hand, information theoretic security, see, e.g., 
ED, EH, presents secrecy guarantees even with infinite 
computational power at eavesdroppers without requiring the 
sharing and/or distribution of keys. This approach is based on 
the design of secrecy-achieving coding schemes by taking into 
account the amount of information leaked to eavesdroppers, 
and can offer new solutions to security challenges in DSS. 
In its simplest form, the security can be achieved with the 
one-time pad scheme ll26l . which claims the security of 
the ciphertext obtained by XOR of data and uniform key. 
This approach is of significant value to DSS. For example, 
consider a system storing the key at a node, and ciphertext 
at another node. Then, the eavesdropper will not obtain any 
information by observing one of these two nodes, whereas the 
data collector can contact to both nodes and decipher the data. 

The problem of designing secure DSS against eavesdrop- 
ping attacks has been recently studied by Pawar et al. Q, 
where the authors consider a passive eavesdropper model that 
observe the data stored on £ (< k) storage nodes for a 
DSS employing an MBR code. The proposed schemes are 
designed for the "bandwidth limited regime", and shown to 
achieve an upper bound on the secure file size, establishing 
its optimality. Shah et al. l27ll consider the design of secure 
MSR codes. Here, they show that the eavesdropper model for 
an MSR code should be extended compared to that of an 
MBR code. The underlying reason is that at the MSR point of 



operation, the eavesdropper may obtain additional information 
by observing the downloaded information (as compared to just 
observing the stored information). Thus, at the MSR point, the 
eavesdropper is modeled with a pair (l-y, £2) with £i+£2 < k, 
where the eavesdropper has knowledge of the content of the 
£1 number of nodes, and, in addition, has knowledge of the 
downloaded information (and hence also the storage content) 
of the £2 number of nodes. We note that, as the downloaded 
data is stored for minimum bandwidth regenerating codes, 
the two notions are different only at the minimum storage 
point. Considering such an eavesdropper model, Shah et al. 
present coding schemes utilizing product matrix codes |6|, 
and show that the bound on secrecy capacity in [5] at MBR is 
achievable. They further use product matrix based codes for 
MSR point as well, and show the bound in is achievable 
only when £ 2 = 0. In addition to this classical MBR/MSR 
setting, the security aspects of locally repairable codes (see, 
e.g., Il28l - ll33l ) are studied in l34l : and security against active 
eavesdroppers are investigated in l35l - ll37l . 

In this paper, we analyze and design secure and cooperative 
regenerating codes for DSS. In terms of security requirements, 
we utilize a passive and colluding eavesdropper model as 
presented in ll27ll . In this model, during the entire life span 
of the DSS, the eavesdropper can gain access to data stored 
on an l\ number of nodes, and, in addition, it observes both 
the stored content and the data downloaded (for repair) on an 
additional £2 number of nodes. Given this eavesdropper model, 
we focus on the problem of designing secure regenerating 
codes in the context of DSS that performs multiple node 
repairs in a cooperative manner. This scenario generalizes 
the single node repair setting considered in earlier works 
to multiple node failures. First, we present upper bound on 
the secrecy capacity for MBCR codes, and present a secure 
coding scheme that achieves this bound. This proves the 
tightness of the bound and characterizes the secrecy capacity 
for MBCR codes. Next, we address the secrecy capacity of a 
DSS employing the MSCR codes, and show that the existing 
MSCR codes can be made secure against eavesdropping. In 
this minimum storage setup, our codes match the upper bound 
secure file size under special cases. In all scenarios, the 
achievability results allows for exact repair, and secure file 
size upper bounds are obtained from mincut analyses over 
the secrecy graph representation of DSS. The main secrecy 
achievability coding argument of the paper is obtained by 
utilizing a secret precoding scheme to obtain secure coding 
schemes for DSS. In some cases, this precoding is established 
simply with the one-time pad scheme, and in others maximum 
rank distance (MRD) codes are utilized similar to the classical 
work of [38). 

The rest of the paper is organized as follows. In Section 
II, we provide the general system model together with some 
preliminary results utilized throughout the text. Section III 
provides the analysis of secure MBCR codes, and Section IV 
is devoted to the secure MSCR codes. The paper is concluded 
in Section V, and, to enhance the flow of the paper, some of 
the results and proofs are relegated to appendices. 
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II. System Model and Preliminaries 

Consider a DSS with n live nodes at a time and a file 
f of size Ai over ¥ q that needs to be stored on the DSS. 
In order to store the file f, it is divided into k blocks 
of size 4p each. Let (fi,...,f&) denotes these k blocks. 

Here, we have fj e ¥ q k . These k data blocks are encoded 
into n data blocks, (xi, . . . ,x„), each of length a over F 9 
(a > 44-), Given the codewords, node i in an n-node DSS 
stores encoded block x;. In this paper, we use x», to represent 
both block Xj and a storage node storing this encoded block 
interchangeably. Motivated by the MDS property of the codes 
that are traditionally proposed to store data in centralized 
storage systems 1391 - BTI . the works on regenerating codes 
focus on storage schemes that have "any k out of n" property, 
i.e., the content of any k nodes will suffice to recover the file. 
We focus on codes achieving this property. 

We use the following notation throughout the text. We 
usually stick with the notation of having vectors denoted by 
lower-case bold letters; and, sets and subspaces being denoted 
with calligraphic fonts. For a < b, [a : b] represents the set 
of numbers {a, a + 1, • • • , b}. (This is shortened as [b] for 
[1 : b], and brackets are omitted in subscripts to improve 
readability.) The symbols stored at node i is represented by 
the vector Sj, the symbols transmitted from node i to node 
j is denoted as djj, and the set dj is used to denote all of 
the downloaded symbols to node j. DSS is initialized with 
the n nodes containing encoded symbols, i.e., Sj = Xj for 
i = 1, • • • ,n. 

A. Cooperative repair in DSS 

In most of the studies on DSS, exact repair for regenerating 
codes is analyzed in the context of single node failure. 
However, it is not uncommon to see simultaneous multiple 
node failures in storage networks, especially for large ones. 
The basic setup involves the simultaneous repair of t (possibly 
greater than one) failed nodes. After the failure of t storage 
nodes, the same number of newcomer nodes are introduced to 
the system. Each such node contacts to d live storage nodes 
and downloads j3 symbols from each of these nodes. In ad- 
dition, utilizing a cooperative approach, each newcomer node 
also contacts other nodes being under repair and downloads 
j3' symbols from each other node. Hence, the total repair cost 
is given by 

1 = dp + {t-l)P'. (1) 

Each newcomer node, to repair the i-th node of the original 
network, uses these dj3 + (t — l)j3' number of downloaded 
symbols to regenerate a symbols, x;, and stores these symbols. 
This exact repair process preserves the MDS property, i.e., data 
stored on any k nodes (potentially including the nodes that 
are repaired) allows the original file f to be reconstructed. See 
Fig.ffl 

We remark that, as also argued in 11211 . d > k can be 
assumed without loss of generality. (Earlier papers on the 
subject assumed d > k case, and noted that this is assumed for 
simplicity. See, e.g., Ifl6l - ll20l .) Remarkably, if d < k, a data 
collector can reconstruct the whole file by contacting only d 




Fig. 1: Information flow graph of DSS implementing cooper- 
ative repair. In this representative example, we have n = 5, 
d = k = 3, and t — 2. Accordingly, after a failure of two 
nodes, namely node 1 and node 2, the system cooperatively 
repairs these two nodes as node 6 and node 7. Downloads from 
live nodes (blue) and from cooperative repair pairs (green) are 
shown. Due to exact repair, the network will repair the nodes 
to satisfy a;° ut = :c° ut and x° 7 nt = x° 2 nt . 

nodes, as from these nodes the other nodes can be repaired 
in groups of size t. Thus, any (n, k, d) code with d < k can 
be reduced to (n, k' = d, d) code. Therefore, without loss of 
generality, we will assume d> k. 

B. Information flow graph 

In their seminal work [4], Dimakis et al. models the opera- 
tion DSS using a multicasting problem over an information 
flow graph. (See Figs. [T] and |2] for the flow graph in the 
cooperative setting.) Information flow graph consists of three 
types of nodes: 

• Source node (S): Source node contains M. symbols long 
original file f . The source node is connected to n nodes. 

• Storage nodes {(x\ n , xf , x° ut )): In information flow 
graph associated with cooperative regenerating codes, we 
represent each node with a combination of three sub- 
nodes: x ln , x c °, and x out . Here, x m is the sub-node 
having the connections from the live nodes, x co is the 
sub-node having the connections from the nodes under 
repair in the same repair group, and x out is the storage 
sub-node, which stores the data and is contacted by a data 
collector or other nodes under repair. x ln is connected to 
x co with a link of infinite capacity, x c ° is connected to 
a; out with a link of capacity a. We represent cuts with 
a notation with bars as in (x m , x co |a; out ), meaning the 
cut is passing through the link between x c ° and x° ut . 
(See Fig. |2) The nodes on the right hand side of the cuts 
belong to data collector side, represented by the set V, 
whereas the nodes belonging to the left hand side of the 
cuts belong to T) c , the source side. For a newcomer node, 
x\ n is connected to x out sub-nodes of d live nodes with 
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Fig. 2: Information flow graph of DSS implementing cooper- 
ative repair under security constraints. In this representative 
example, we have n = 5, d = k = 3, and t = 2. 
Multiple repair stages and a cut, represented by dotted line, 
through the nodes connected to the DC are shown. The 
figure has different cut types: The first repaired node has 



a cut of type (jo; 11 
type (a; in ,a; co |a; 



D ,x ) and the second has a cut of 
Nodes that are being eavesdropped are 



indicated with dashed-dotted lines. Here, both the content and 
the downloads of the first repaired node is observed by the 
eavesdropper (£2 = 1), and only the content of the last repaired 
node is observed by the eavesdropper (l\ = 1). Accordingly, 
eavesdropper has observations of d/3 + (t — l)/3' downloaded 
symbols from the first repaired node, and has a number of 
symbols from the last repaired node. 



links of capacity j3 symbols each, representing the data 
downloaded during node repair. This newcomer node also 
connects to x m sub-nodes of {t — 1) nodes being repaired 
in the same group, each having a link capacity of /?'. 
Data collector node(s) (DC): Each data collector contacts 
^out su b-node of k live nodes with edges each having 00- 



link capacity. 



C. MBCR and MSCR points 

With the aforementioned values of capacities of various 
edges in the information flow graph, the DSS is said to 
employ an (n, k, d, a, f3, f3') code. For a given graph Q and 
data collectors DCi, the file size that can be stored in such a 
DSS can be bounded using the max flow-min cut theorem for 
multicasting utilized in network coding l42l . (43). 

Lemma 1 (Max flow-min cut theorem for multicasting H, 
E), lH). 



Ai < minminmaxflowfS 

Q DCi 



DCi, G), 



where flow(S — > DCi,C/) represents the flow from the source 
node S to data collector DCi over the graph G- 

Therefore, e.g., for the graph in Fig. [2] M. symbol long file 
can be delivered to a data collector DC, only if the min cut 
is at least A4. 



Dimakis et al., H, consider k successive node failures 
and evaluate the min-cut over possible graphs, and obtain the 
following bound (for t = 1 case). 



fc-i 

M < V min {a, (d - i)(3} 



(2) 



We emphasize that the min-cut for this (t = 1) case is 
given by the scenario where k successively repaired nodes are 
connected to DC, and, for each successive repair, the repaired 
node i + 1 also connects to i number of previously repaired 
nodes. Hence, for each DC-connected node, the cut value is 
equal to (d — i)f3 if the cut is of type (|x m ,x out ), and is equal 



to a if the cut is of type (a; 11 



(Note that, x°° does not 



appear here as the model considered in J4) does not involve 
cooperative repair.) The codes that attain the bound in (f2j) are 
named as regenerating codes (4). 

For the cooperative scenario, we consider secure file size 
upper bound in the next section using similar min cut argu- 
ments in the presence of eavesdroppers. Removing the leakage 
(to eavesdropper) terms one will obtain the min cut file size 
bound for the cooperative scenario. In particular, a file size 
bound in the cooperative setting is obtained as follows. 



m<j: 



Uj mm < a, 



d -J2 u J P+(t~Ui)P' \, (3) 



where ui £ [0 : t] is the number of repaired nodes in repair 
group i S [0 : g — 1] that is connected to DC. Similar to the 



t = 1 case described above, the cut of type (x u 



x co ,x out ), on the 



has a value of a. The cut of type (|x 
other hand, has a value of (t — ui) j3' due to the links coming 

from the nodes under repair that are not connected to DC and 

i-i 

additional value of (d — ^2 Uj)f3 due to the connections to 

j=o 

the previously repaired live nodes that are not contacted by 
DC. (Here, we again subtract the values of the flows from the 
nodes already belonging to the data collector side, T>.) The 
cut of type (x ln \x co ,x out ) has value of 00 and hence, does 
not appear in the min-cut. 

Note that, given a file size M., there is an inherent trade 
off between storage per node a and repair bandwidth 7 = 
df3 + (t — This trade off, for the cooperative setting, can 
be established using a similar analyses leading to MBR/MSR 
points from the equation (fJJ. Two classes of codes that 
achieve two extreme points of this trade off are named as 
minimum bandwidth cooperative regenerating (MBCR) codes 
and minimum storage cooperative regenerating (MSCR) codes. 
The former is obtained by first finding the minimum possible 
7 and then finding the minimum a satisfying (O. This point 
is given by the following. 



«MBCR 



M 2d + 1 - 1 



k 2d- 
M 



t-k' 



7MBCR 



MBCR 



k 2d + t - k ' 



M 1 



MBCR 



k 2d + t — k 



(4) 



MSCR point, on the other hand, is obtained by first choosing 
a minimum storage per node (i.e., a = Ai/k), and then 
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M 2d+t-l 
' k 2d+t-k 
. M 2 



into DSS, and e is the eavesdropper observation vector given 




MSCR 



M 

k 



= P 



M 



k d+t-k 



Repair bandwidth (7) 

Fig. 3: Storage vs. repair bandwidth trade off for cooperative 
regenerating codes. The repair bandwidth is given by 7 = 

dj3 + (t- l)/3'. 



minimizing 7 (via choosing minimum possible j3-j3' pair) 
satisfying the min cut (0). 



omscr 



0: 



M 
k ■ 
M 



7MSCR 
1 



MSCR = 



k d + t-k 1 



Md + t-1 
' ~k d + t-k' 

0' ~ M 

Pmscr 



1 



k d+t-k 



(5) 



We depict these two trade off points, which are directly 
computable from (O, in Fig. [3] (We refer reader to the 
works lfT3l . lfl4l for a detailed derivation of these two points. 
See also Q2) for an analysis for the simplified case of when 
t\k, i.e., the number of groups satisfies g = k/t.) Note that, 
when t = 1, these two points correspond to MBR/MSR points 
characterized in |4|. 

D. Eavesdropper model 

We consider an (^1,^2) eavesdropper, which can access the 
stored data of nodes in the set E\, and additionally can access 
both the stored and downloaded data at the nodes in the set 
£2, where t\ = \£\\ and £2 = | £"2 1 - Hence, the eavesdropper 
has access to x° ut for i <E £\ and x™ , , x° ut for j G £2- 
(See Fig. [2]) This is the eavesdropper model defined in 
ll27l (adapted here to the cooperative repair setting), which 
generalizes the eavesdropper model considered in J5). The 
eavesdropper is assumed to know the coding scheme employed 
by the DSS. At the MBCR point, a newcomer downloads 
ckmbcr = 7mbcr amount of data. Thus, an eavesdropper does 
not gain any additional information if it is allowed to access 
the data downloaded during repair. However, at the MSCR 
point, repair bandwidth is strictly greater than the per node 
storage, aMSCR, and an eavesdropper potentially gains more 
information if it has access to the data downloaded during 
node repair as well. We summarize the eavesdropper model 
together with the definition of achievability of a secure file 
size in the following. 

Definition 2 (Security against an (^1,^2) eavesdropper). A 
DSS is said to achieve a secure file size of A4 S against an 
[£\,£2] eavesdropper, if, for any sets £\ and £2 of size l\ and 
£2, respectively, I(f s ] e) = 0. Here f s is the secure file of size 
A4 S , which is first encoded to file f of size M. before storing 



by {x° ut ,xf,xf,x° ut 



e £i,j e £2}. 



We remark that, as it will be clear from the following 
sections, when a file f of size M is stored in DSS and the 
secure file size achieved is A4 3 , the remaining M. — A4 S 
symbols can be utilized as public data, which does not have 
security constraints. Yet, noting the possibility of storing the 
public data, we will refer to this uniformly distributed part 
as the random data, which is utilized to achieve security. 
Finally, we note the following lemma, which will be used 
in the following parts of the sequel. 

Lemma 3 (Secrecy Lemma). Consider a system with infor- 
mation bits u, random bits r (independent of u), and an 
eavesdropper with observations given by e. If H(e) < H(r) 
and H(r\u,e) = 0, then l(u;e)=0. 

Proof: See Appendix lAl ■ 

III. Secure MBCR codes 

In this section, we study secure minimum bandwidth coop- 
erative regenerating codes. We first present an upper bound on 
the secure file size that can be supported by an MBCR code. 
Then, we present exact repair coding schemes achieving the 
derived bound. In addition, we analyze how the cooperation 
affects the penalty paid in securing storage systems. 

A. Upper bound on secure file size of MBCR codes 

Analysis of the cut-set bounds for cooperative regenerating 
codes are provided in |[T3l . Ifl4l . (See also the arguments given 
in £□], OH. Here, we follow the notations of HD, 03.) 
We consider groups of nodes being repaired, and denote the 
number of nodes in group i that are repaired in group i and 
contacted by the data collector as Ui such that 



[t],Vi = 0,l,-.- ,9-1, 



9-1 



where g is the total number of groups that have been repaired. 
While evaluating an upper bound on the file size that can 
be securely stored on the DSS, the data collector under 
consideration is assumed to contact only these k nodes that 
belong to one of these g groups. 

We consider two types of cuts: m; number of nodes have 
the first cut type (x ln , x co \x° ut ), and Ui — rrii number of nodes 
have the second cut type (\x m , x co , x° ut ), < i < g — 1. Note 
that the cuts of the form (x m ,x c °\x out ) give a cut value of a 
as opposed to (x ln \x co , x° ut ), which has cut value larger than 
a. Since we are interested in the cuts of smaller size, we do 
not consider the cuts (x ln \x co , x° ut ). 

We consider £\ number of colluding eavesdroppers, each 
observing the contents of different nodes. Note that, for MBCR 
point analysis, we can consider £2 = without loss of 
generality, as the amount of data a particular node stores is 
equal to amount of data it downloads during its repair. We 
denote the number of eavesdroppers on the nodes in the first 
cut type as l l { , < i < g — 1; and denote the number 
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of eavesdroppers on the nodes in the second cut type as from which we obtain 
lf,0 < i < g — 1, such that 



M* 



< 



(k-li)(2d-k-l x + l) 



ly < rrii 

i=0 

Thus, for group i, due to the eavesdroppers, the nodes that 
belong to the first type can only add the value of (m, — if) a 
to the cut. The second type, on the other hand, consists of 
Ui — nii nodes, out of which if of them are eavesdropper. 
As the data downloaded is equal to the data stored at MBCR 
point, the nodes that are eavesdropped do not add a value to 
the cut. The remaining Uj — nii — if number of nodes contact 

d live nodes, u j number of these belong to the previous 

i=o 

groups being repaired. In addition, these nodes contact t — 1 
nodes from the same repair group, out of which u,i — mi — 1 
number of nodes belong to V. Accordingly, this cut-set bound 
is given by the following. 
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M S <Y, ((mi - if) a+( Ui - mi - if) C t ) , (6) 



i=0 



where 



Ci = \ d - u jj P + {t-u t + mi) 13'. 

Each summation term in (|6]l is concave for nij S [0, iti]. 
We consider two scenarios in ©, (i) m % = 0, if = l\ and 
(ii) rrii = Ui,l % { = l\. Hence, we obtain, 

9-1 ( 

M s < - l\) min I a, 

i=0 I 

d -J2 u J } P + (t-Ui)p' \. (7) 

3=0 

Note that, at MBCR point, the nodes store what they 
download, therefore the MBCR codes should satisfy 

a = dP + (t-\)p. (8) 

Utilizing this, we consider the following cases of (0. 
Case 1: g = k, Ui = 1, Vi = 0, • • • , k — 1 



fc-i 

M s < £(1 -[*)((<*- *)£ + (*- 1)0') (9) 

i=Q 

Here, the minimum cut value corresponds to having l\ = 1 
for i = 0, 1, • • • ,£\ — 1; and l\ = otherwise. Hence, we get 



fe-l 



M s < ^(d-i^ + ^-l)^ 



(10) 



+ {k-h)(t-l)p'. 

Case 2: If t > k, g = 1, uo = k 

M s < (k-£ 1 ){dl3+{t~k)/3 / ) 
Case 3: If t < k, g = [k/tj+1, u t = t for i = 0, • 



(11) 

(12) 

,9-2, 



and u 
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k - [k/t\ t 



Let a = \k/t\ and b = k — at, so that k = at + b. From 
10, we obtain 

o-l 

M s < ^2(t-l{)(d-it)p 

i=0 

+ (6-i;){(d-ot)j9 + (t-6)j9'}. (13) 

Considering possible allocations of eavesdroppers in this 
bound, i.e., {l\}flQ, we obtain the following bound (where 
we collect eavesdropper dependent terms in the variable S 
given below). 

(k-b)(t-k-b)' 



M s < P{kd + 



+ /3'b{t -b)-S, 



where S is given by 

S == 



0.-1 

E 

l\<t S.t. J2 i\=ti 4=0 



max 



l\(d-it)/3 



(14) 



(15) 



+ Z?{(d-ai)/3 + (t-6)/3'} 

i/tj-i 

J2 t(d - it)P 

i=0 

+ (h - L^iAJ *)(d- L^iAJ t)/3 
: ^x(d - L^iAJ t) + t -f IA/*J (L^iAJ + 1). 

if tx<at = k- b. 



a-l 

X; t(d-it)j3 

i=0 

+ (h - at){(d - at)/3 + (h - at)(t - b)/3'} 
= fil x {d - at) + ^-a{a + 1) + - at)(t - 6)j8', 



if £i > at 
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Note that we consider the worst case eavesdropper allocation 
to maximize S in the above derivation. 

The normalized values at the MBCR point are given by 

P' = 1,0 = 2, a = 7 = 2d + 1 - 1, M = k(2d -k + t). (16) 

Using this and the bounds given in ( fTTT i. ( fT2b , and ( TPfl l. we 
get a bound on the secure file size at the MBCR point. We 
state this result in the following. 

Proposition 4. Cooperative regenerating codes operating at 
the MBCR point with a secure file size of M s satisfy 

M s < k(2d-k + t) -h(2d-h +t) 

= (k-h){2d + t-k-£i), (17) 
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and the MBCR point is given by j3' = 1, P = 2, a = 7 = 
2d + t - 1 for a file size of M = k(2d -k + t). 

Proof: We show that (fT2] i and ( TBI result in loose bounds 
compared to that of ( fTTT > in Appendix [B] And, (fTTT t evaluates 
to the stated bound at the MBCR point. ■ 

B. Code construction for secure MBCR when n = d + t 

We consider secrecy precoding of the data at hand before 
storing it to DSS nodes using an MBCR code. We establish 
this precoding with maximum rank distance (MRD) codes. 
In vector representation, assuming m > n, the norm of a 
vector v <E ¥ qm is the column rank of v over the base field 
F g , denoted by i?fc(v|F 9 ). (This is the maximum number of 
linearly independent coordinates of v over the base field ¥ q , 
for a given basis of ¥ q ™. over F 9 . A basis also establishes 
an isomorphism between n-length vectors, in F™„, , to m x n 
matrices, in ¥ q nxn .) Rank distance between two vectors is 
defined by d(vi,V2) = Rk(vi — V2|F 9 ). (In matrix represen- 
tation, this is equivalent to the rank of the difference of the 
two corresponding matrices of the vectors.) An [n, k, d] MRD 
code over the extension field ¥ q ™ achieving the maximum 
rank distance d = n — k + 1 (for m > n) can be constructed 
with the following linearized polynomial. (This is referred to 
as the Gabidulin construction of MRD codes, or Gabidulin 
codes B4l-fi7l/) 

fe-i 

/(<?) = 5>i<? H , (18) 

i=0 

where [i] = q 1 , and g, ui G ¥ q m . Then, given n linearly 
independent elements over ¥ q , {<?i, ■ ■ • , g n } with gj G ¥ qm , 
the codewords for a given set of k elements, m G F g ™ , i = [0 : 

fe-l r-, 

k — 1], are obtained by Xj = f(gj) = X) u i9j f° r i = [1 : n ]- 

i=o 

(With generator matrix representation, we have x = uG, 
where G = [g ir -- ,#„;•■■ ;gf~ 1] ,--- ,g£~%) We also 
note that the linearized polynomial satisfies f{a\gi +02.92) = 
Oi/(ffi) +02/(52). for a given ai,a 2 G ¥ q and g 1} g 2 G ¥ qm , 
and this will be utilized in the following. 

Consider now the MBCR point given by M = k(2d—k+t), 
P' = 1, P = 2, a = 7 = 2d + t - 1, M s = k{2d -k + t) - 
£i(2d - l\ + t), and n = d + i. We use MRD codes with 
n = k = M; hence, the rank distance bound d < n — k + 
1 is saturated at <2 = 1. Accordingly, we utilize [yVl,.M,l] 
MRD codes over ¥ q m, which maps length M. vectors (each 
element of it being in ¥ q m) to length M codewords in ¥ q \l 
(with ?7i > Ai). The coefficients of the underlying linearized 
polynomial (f(g)) are chosen by A4 — A4 S random symbols 
denoted by r G ¥^~ MS and M s secure data symbols denoted 
by u G ¥ q \i . The corresponding polynomial f(g) is evaluated 
at Ai points {gi,. . . , gjvi }. which are linearly independent over 
Fq. We denote these as Xj — f(gj) for j = 1, • • ■ , M. This 
finalizes the secrecy precoding step. 

The second encoding step is based on the encoding 
scheme for cooperative repair proposed in ||20ll . (Here, we 
will summarize file recovery and node repair processes 
for the case of MRD precoding, and provide the proof of 



security.) Split the Ai symbols into two parts a) x\ to x n k, 
and b) x nk +i to x nk+k[d ^ k ). (Note that n = d + t and 
Ai = nk + k(d — k).) The first part is divided into n groups 
of k symbols, and stored in n nodes. Here, node i stores 
x (i-i)k+i to x ik- The second part is divided into d — k groups 
of k symbols. These symbols are encoded with an (n, k) MDS 
code, and stored on n nodes. In particular, {j/j.i, . . . , Uj,n} 
are generated from symbols {x nk+{j _ 1)k+1 , . . . ,x nk+]k }, 
and yjj is stored at node i, for j = 1, • • • ,d — k. Node 
i, having stored {x {l _ 1)k+1 , . . . ,x lk ,y lyl , . . .y d - k ,i}, which 
is referred to as the primary data of node i, encodes these 
symbols using an (n— 1, d) MDS code having a Vandermonde 
matrix $ of size d x (n — 1) as its generator matrix. (This 
choice of $ ensure that [1^ $] is generator matrix for an 
(n + d — l,d) MDS code.) These n — 1 symbols are stored 
in every other node one-by-one. We denote the encoded 
primary data of node i that is stored in node j ^ i as 
Zj t {. We call these as the secondary data. This procedure 
is repeated for every node, so that each node i stores 

{ x (i-l)k+l: ■ ■ ■ j x iki Dl,h ■ • ■ j Ud-k,i, ■ ■ ■ i 

. . . , Zi >n }, and hence total number of symbols stored at each 
node is k + (d - k) + (n - 1) = d + n - 1 = 2d + t - 1 = a. 

File recovery at DC: DC connects to any k nodes, without 
loss of generality we assume the first k nodes. From yj t uk, 
DC can obtain x nk+{j _ l)k+l , ■ ■ ■ , x nk+jk , for each j = [1 : 
d — k]. It can re-encode this into yj t i-.n using the MDS code, 
and obtain the other y symbols at the remaining nodes. Then, 
for each i G [k + 1 : n], DC can use the MDS property of 
[Id to obtain ■ ■ ■ , x%k symbols of node i from 

the k secondary data symbols of the contacted nodes, i.e., 
Zj t i for j = [1 : k], and additional d — k symbols, yj^ for 
j = [1 : d—k}. Having obtained x\, ■ • • , xm> DC can perform 
interpolation to solve for both data and random coefficients. 

Node repair: Assume that the first t nodes fail. From 
the secondary data stored in the remaining d = n — t 
nodes, z t+M ,--- ,z n>i , one can recover x (l _ 1 ) /c+1 , ■ ■ ■ ,x ik 
and yi. i, ■ ■ ■ , yd-k,i'< f° r n °de i = 1, • • ■ , t. (This corresponds 
to sending 1 symbol from each of d nodes to each of the t 
nodes.) Then, to recover the secondary data stored at each node 
under repair, say for the node j = 1, • • • , t, every other node, 
i.e., nodes i ^ j, including the nodes under repair, computes 
and sends its corresponding encoded primary data, i.e., Zj i, 
to node j. (This corresponds to sending 1 symbol from each 
node to each of the t nodes.) This achieves P = 2 and P' = 1 
symbols for the repair procedure. 

Security: Consider that the eavesdropper is observ- 
ing the first l\ nodes. Due to the code construction, 
the symbols in the sets X = {x\, . . . , xi lk ], y = 

{yi,i,---,Vd-k,i,--- ,yi,e 1 ,---,yd-k,e 1 }, 2 = {z jti for j = 

1, . . . , £1, and i = l\ + 1, ■ • ■ , n] correspond to linearly inde- 
pendent evaluation points. (Note that, the symbols {zj^} for 
j = 1, • • • ,£i\ i = 1, • • • ,£i\ j ^ i, are linear combinations 
of the symbols in X U y.) Due to the linearized property of 
the code, the eavesdropper observing l\<x = l\{2d + 1 — 1) 
symbols, has evaluation of polynomial /(•) at l\{2d + t — l\) 
linearly independent points. Using the data symbols, together 
with interpolation from these l\[2d + t — £1) symbols, the 
eavesdropper can solve for £i(2d + t — £\) random symbols. 
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Therefore, denoting the eavesdroppers' observation as e, we 
have H(r\e,u) = 0. As, H(e) = H(r), from Lemma [3] we 
have /(u; e) = 0. 

Using the upper bound given in Proposition @1 we obtain 
the following result. 

Proposition 5. The secrecy capacity at MBCR point for a file 
size of M s = k(2d - k + t) is given by M s = k(2d - k + 
t) - e 1 (2d-£ 1 + t), ifn = d + t. 

C. Does cooperation enhances/degrades security at MBCR? 

Cooperative regenerating codes has a repair bandwidth 
given by 7 = df3 + (t — l)/3' . In this section, we analyze 
-j^j, the ratio of repair bandwidth to the secure file size. In 
the following, we refer to this parameter as the normalized 
repair bandwidth (NRBW). 

Without the security constraints, for which l\ = in 
PropositionHJ we observe that at MBCR point NRBW is given 
by 

"""-"' TO (19) 

which is equal to 

NRBW(li = 0, n = d + t) = ^ ~ 1 ~ 1 (20) 

K(2n — K — t) 

for a system with n = d + t. Here, the classical (i.e., non- 
cooperative) scenario corresponds to t = 1 case, which has an 
NRBW of 

NRBW(h = 0,n = d + t,t = 1) = — r- (21) 

kyln — k — lj 

Comparing the last two equations, we see that 

NRBW(£i =0,n= d+t) > NRBW(4 = 0, n = d+t, t = l), 

with equality iff t = 1. Therefore, without the security 
constraints, having simultaneous repairs of size greater than 
1 actually increases the repair bandwidth. This nature of 
cooperation also results in the conclusion that deliberately 
delaying the repairs does not bring additional savings ff3l . 
(This observation is proposed for both MBCR and MSCR 
points in lfl3l with an analysis of derivative of 7 with respect 
to t. Here, we provide an analysis with NRBW.) 

We revisit the above conclusion under security constraints. 
The question is whether the cooperation (i.e., having a system 
with multiple failures, or deliberately delaying the repairs) 
results in a loss/gain in secure DSS. A calculation similar 
to above shows that NRBW for the case t > 1 is strictly 
greater than that of t = 1 when n = d + t for 1% < k. 
The MBCR points given in Proposition |4] for codes satisfying 
< l\ < k < n, d > k, and d = n — t are given in 
Table U in Appendix ICl As evident from the table, we see that 
cooperation does not bring additional savings for secure DSS 
at MBCR point when d + t = n. This in turn means that one 
may not delay the repairs to achieve a better performance than 
that of single failure-repair if d is chosen such that n = d+t for 
a given t, n. However, if the downloads within the cooperative 
group are less costly compared to the downloads from the live 
nodes, then delaying repairs would be beneficial in reducing 
the total cost. We will revisit this analysis for codes having 
n > d + t in the next subsection. 



D. General Code Construction for Secure MBCR 

The code construction above needs the requirement of 
d = n — t. However, for practical systems, it may not be 
possible that a failed node connects to all the remaining nodes. 
This brings the necessity of code constructions for d < n — t. 
Remarkably, for a fixed (n, k, d, M), increasing t can reduce 
the repair bandwidth in the secrecy scenario we consider here. 
This is reported in lfl6l for DSS without secrecy constraints. 
Hence, for a fixed d, delaying the repairs can be advantageous, 
e.g., when there is a limit on the number of live nodes that 
can be connected. In the following, we present a general 
construction which works for any parameters, in particular for 
n > d + t. 

The construction is based on the code construction proposed 
in l2Tl . In l2T1l . a bivariate polynomial is constructed using 
A4 = k(2d + t — k) message symbols as the coefficients of 
the polynomial: 

F(X, Y) = "'j X ' y ' + Y h 'J X ' Y ' 

0<i<k, 0<i<k, 
0<j<k k<j<d+t 

+ J2 '■■J x 'y (22) 

k<i<d, 
0<j<k 

Given q > n, two set of n distinct points, {xi,X2, ■ ■ ■ , x n } 
and {yi, 2/2) • • • t J/n}> are chosen. The i th node in the DSS 
store the following 2d + t — 1 evaluations of polynomial 

F(X,Y): 

F(x it yi),F(x i ,y iel ), . . . , F(x i: ^©(d+t-i)) 
F(x iml ,yi), F(x iB2 ,yi), • • • , F(x im ( d ^ 1 ),y i ) (23) 

where © denotes addition modulo n. The first d + t evaluation 
at node i can be seen as the evaluation of univariate polynomial 
fi(Y) = F(xi,Y) of degree at most d + t— 1 at d + t points. 
This uniquely defines the polynomial fi(Y). Similarly, the first 
evaluation in ( l23l , F(xi,yi), along with last d— 1 evaluations 
uniquely define the univariate polynomial gi{X) = F(X,yi) 
of degree at most d— 1, This property of the proposed bivariate 
polynomial based coding scheme is utilized for the exact node 
repair and data reconstruction processes at MBCR point. (We 
refer to fl2~Tl for details.) 

In order to get an (ti, 0) secure code at MBCR point, we 
rewrite the polynomial in (1221 as follows: 

F(X,Y)= Y <HjX i Y j + "■j X ' )j 

0<i<£i, 0<i<fi. 
0<j<i 1 £i<j<k 

+ Y >HjX i Y i + Y ! '<./ A " V ' 

ll<i<k, li<i<k, 
0<j<ii ii<j<k 

+ Y *> ij X i Yl+ Y h -> X ' yJ 

0<i<ii, li<i<k, 
k<j<d+t k<j<d+t 

+ Y c ij X i Y^+ Y <+ x ' yJ (24) 

k<i<d 1 k<i<d, 

o<j<ti e±<j<k 

Next, we choose t\ + 40 - k) + (k - +l x {d + t- 
k) + (d- k)£i = h(2d + t - £1) coefficients of F(X,Y), 
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* ( x 2,yd+t+i) 
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Fig. 4: Observed symbols at the eavesdroppers for a given t\. 



{ai J }o<i < e u o< J< e 1 , {<Hj}o<i<ti,ti<j<k) {a^}^ 
{6y}o<»«i,fc<j<d+t) {c»i}fc<»<d,o<j<<i, to be random sym- 
bols drawn from ¥ q in an i.i.d. manner. Remaining k(2d + t — 
fc)-4(2d+t-4) = M s coefficients of F(X,Y) are chosen 
to be the data symbols that need to be stored on the DSS. Each 
node i G [n] stores the evaluation of F(X, Y) as illustrated 
in d23l i. It follows from the description of the coding scheme 
of ED in the beginning of this subsection that the resulting 
coding scheme is an exact repairable code at MBCR point. 

Next, we show that the proposed scheme is indeed 
(^i, 0)— secure. If e, u, and r denote the data observed by 
eavesdropper, original data to be stored, and the randomness 
added to the original data before encoding respectively, then it 
is sufficient to show (i) H(e) < H(r) and (ii) ff(r|u, e) = 
in order to establish the secrecy claim (see LemmaO. To argue 
the first requirement, noting that number of eavesdropped 
symbols are l\a = £±(2d + t — 1), we will show that £\ — l\ 
number of these are linearly dependent on the remaining ones. 
The eavesdropper, without loss of generality considering the 
first £\ nodes, observes the symbols given in Fig. [4] Due to 
the code construction, each row above represents evaluations 
of a polynomial of degree less than d + t and each column 
represents a polynomial of degree less than d. Hence, we 
observe that each of the symbols denoted with bold (blue) 
font in the matrix of Fig. [4] is a linear combination of the 
remaining ones. Therefore, H(e) = l\<x — 1) = H(r). 

In order to show that second requirement also holds, we 
present a method to decode randomness r given u and data 
stored on any l\ nodes. Once we know the data symbols u, 
we can remove the monomials associated to data symbols in 
F(X, Y) and the contribution of these monomials from the 
polynomial evaluations stored on DSS. Let F(X, Y) denote 
the bivariate polynomial that we obtain by removing the data 
monomials: 



F(X,Y)= J2 aij^Y j + E "■J X ' Y ' 

0<i<£ lt 0<i<£ t , 

o<j<e 1 d<j<k 

£i<i<k, 0<i<£i, 

0<j<ti k<j<d+t 

+ ]T c l3 X l Y> (25) 

k<i<d, 
0<j<t 1 



F(X, Y) can be rewritten as: 

F(X,Y)= ]T a^X^+ ]T by&Yl 

0<i<£i, 0<i<£ lt 
0<j<ii h<]<d+t 

+ C,,.V'V' (26) 

ii<i<d, 
0<j<ii 

where 



{a-ij ;}o<i<£i, 


— {a,ij}o<i<e 1 , 


0<J<fi 


o< J <e 1 


\hj}o<i<£i, 


= { a ij}o<i<£i, 


e±<j<k 


ti<j<k 


{bij} 0<i<i lt 


= {bij} o<i<e 1: 


k<j<d+t 


k<j<d+t 


{cij}e 1 <i<k, 


= {aij}£ 1 <i<fe, 


0<j<ti 


0<j<£i 


{£ij} k<i<d. 


= { c ij} k<i<d. ■ 


0<j<ti 


0<j<ti 



F(X,Y) in d2Sl> takes the same form as F(X, Y) in 
d22b with k replaced with t\. Therefore the randomness r, 
coefficients of F(X, Y) in d26l ), can be decoded from the data 
observed on l\ nodes using the data reconstruction method 
described in 12T1 . Thus, we obtain the following result. 

Proposition 6. The secrecy capacity at MBCR point for a file 
size of M s = k(2d - k + t) is given by M s = k(2d - k + 
t) - l x {2d -h+t) for any n > d + t. 

We list some instances of this construction in Table [TT] in 
Appendix [C] As evident from the table, cooperation helps to 
reduce the repair bandwidth if d < n— t. Thus, (secure) coding 
schemes for the case of n > d + t are of significant interest 
in order to reduce the repair bandwidth in cooperative repair. 

IV. Secure MSCR Codes 

We first consider upper bound on the secure file size, 
and then utilize appropriate secrecy precoding mechanisms to 
construct achievable schemes. 

A. Upper bound on the secure file size 

At MSCR point, the nodes have minimum possible storage, 
i.e., a = 4^ . Using the cut-set analysis given in the previous 
section, one can obtain that the minimum repair bandwidth 
can be attained with = 0'= ^ - T^ITt)- ( See 
also 03], (15).) At MSCR, therefore the downloaded data can 
be larger than the data stored in the nodes. Thus, for secrecy 
constraints, we consider two eavesdropper types: storage-only 
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eavesdroppers {£{) and storage-and-download eavesdroppers 
(£2)- Using the size of these sets we denote the eavesdropper 
setting with (^1,^2) as introduced in Section [TT] Here, the 
eavesdroppers in £ 2 observe both the downloaded data from 
live nodes and from that of cooperation nodes. Similar to the 
secure file size bound analysis given in the previous section, 
we obtain the following bound. 

fc-i 

M s < ^(l-/l-Z 2 ) m in{ a -/(s i; d^ 2 ), 

i=0 

(d-i)l3 + (t-l)/3'y (28) 

Here, we consider it, = 1 number of nodes of stage i 
include l\ number of eavesdroppers from £\ and l 2 number 
of eavesdroppers from £2- Compared to the MBCR bounds, 
due to eavesdroppers in £2, nodes that are not eavesdropped 
may leak information during their participation in repair of a 
node having an £2 type eavesdropper. Thus, the values of the 
cuts of type 1 include additional penalty terms /(s,; di t £ 2 ), 
counting the leakage from the storage at the i-th node to 
nodes indexed with £2- (Here, the cut value can be written as 
H(si\di !£2 ) = H(si) - J(sj;di i£a ).) Considering the MSCR 
point values of a, (3, and (3 1 given above, the second term 
of d28T l will be loose. (The cases considered for (fT2l and 
(TBI , when specialized to the MSCR point, do not give tighter 
bound than that of (|28li.) Hence, considering that the first 
k — £\ — £2 repairs are eavesdropper-free, (f28b will evaluate to 
the following bound. 

Proposition 7. Cooperative regenerating codes operating at 
the MSCR point with a secure file size of Ai s satisfy 

k—t-L-tz-l 

M s < a-I( Si ;d it £ 2 ), (29) 

where MSCR point is given by /3 = fi' = 1, a = d — k + t, for 
a file size of M. = k{d — k + t). In addition, at MSCR, one 
can bound I(sf, di t s 2 ) > P' — ft and obtain the bound 

M s < (fc-4-4)(a-/3)- 

B. Code construction for secure MSCR when k = t = 2 

We consider an interference alignment approach based on 
the one proposed in |fT9"l , considering k = t = 2. (See also 
||48l , 11491 .) For any (n,k,d,t) with d > k and n = d + t, 
we have a = d — k + t = n — 2, and M. = k(d — k + t) = 
2(d — k + t) = 2a at MSCR point. From the bound given in 
Proposition |7] the achievability of positive secure file size is 
possible only when (h,t 2 ) = (1,0) or (£ 1 ,£ 2 ) = (0,1) when 
k = 2. Corresponding bounds are given by A4 S < a and 
A4 S < a — 1, respectively. (For the latter bound, as \£ 2 \ = 1, 
dix 2 necessarily consists of one symbol as j3 = ft' = 1, 
and the non-eavesdropped node participates in the repair of 
the eavesdropped node by sending /3 or /?' symbols.) In the 
following, we construct codes achieving the stated bounds 
for both cases, hence establishing the secrecy capacity when 
k = t = 2. We show this with codes having n = d + 1, i.e., 
all the nodes participate in the repair. The construction can 



be extended to cases with n > d + t by following a similar 
approach and choosing a larger field size. 
Case 1: M s = a when (£ u £ 2 ) = (1,0) 
Consider a finite field size of q = n — 1 with generator 
w, a number of random symbols n,> • ■ , r a , and a number 
of secure information symbols sx, • • ■ ,s a . Both information 
and random symbols are uniformly distributed over the field. 
We construct the file given by M — {a,i = ri,-- - ,a a = 
r a ,bi = rx + Sx,- ■ ■ ,b a = r a + s a }, and consider the 
following placement 

• Store a = (01, • • • , a a ) T at the first node, 

> Store b = (61, • • • , b a ) T at the second node, and 

. Store r t = (ax + mod a bi, ■ ■ ■ , a a + 

w (i+a-2) modsyT at the -_ th redundancy nodei 

ie {!,■■■ ,a}. 
Data collector can reconstruct the file M. by contacting any 
of the k nodes, and solving a groups of 2 equations over 2 
unknowns for each group. From file M., it can then obtain the 
secure symbols sx, ■ • • ,s a . For cooperative repair, considering 
the repair of the first systematic node, i-th redundancy node 
storing = a + B^b, sends Vi^r^ = wi^a + zb, where 
vii = zB^ 1 and z = (1, ••■ ,1). (Repair of the second 
systematic node is symmetric to the first one, and, without 
loss of generality, we consider the repair of the two systematic 
nodes. Repairs of stages involving redundancy nodes can be 
performed as that of the systematic nodes after change of 
variables.) Second systematic node, having received c 2 = 
{v2,iri, • • • , v 2t d,Yd}, chooses the repair vector vi o such that 
v i,o c 2 = wi oa + zb; and sends V10C2 to the first systematic 
node. Then, the first systematic node solves d + 1 equations 
{wi oa+zb, wi ia+zb, • • • , wi ^a+zb} in d+ 1 unknowns 
{ax, • ■ ■ , a a , zb}. Noting that the regeneration and repair are 
similar to the ones proposed in |fT9l , it remains to show the 
secrecy of the file. Here, regardless of eavesdropped node 
being in the systematic or parity nodes, given the secure 
symbols, u = Si,-- - ,s a , the eavesdropper can obtain a 
equations in a unknowns r = rx, ■ ■ ■ ,r a ; and solve for r. This 
shows that H(r\u,e) = 0, where the eavesdropper observes 
the content of the eavesdropped node, i.e., 6 = 8^. We see 
that, at the eavesdropped node, the content of the stored data 
necessarily satisfies H(e) = H(ss 1 ) = a. Then, as the code 
satisfies both H(e) < H(r) and H(r\u,e) = 0, we obtain 
from Lemma [3] that I(u; e) = I(sx, • ■ ■ , s a ; Se ± ) = 0. 
Case 2: M s = a - 1 when (£x,£ 2 ) = (0, 1) 
We modify the above construction by considering the file 
given by M = {ax = r lr --,a a = r a ,bx = r 1 + 
sx,-- - , b a -x = r a ^i + s a -x,b a = r a+1 }. The regeneration 
and repair parts are the same as that of the previous section. 
We show that the secrecy constraint is satisfied here. The 
content of the eavesdropped node sg 2 is generated from the 
downloaded data df 2 . Thus, we need to show I(u; e) = with 
u = {sx, • • • , s Q -i} and e = dg 2 . Without loss generality, we 
consider the eavesdropper observing the first systematic node. 
Considering the repair process described above, we have e = 
dg 2 = {wi^a+zb, wi^a+zb, • • • , wi ^a+zb}, from which 
we obtain that H(e) < a+ 1. In addition, as the eavesdropper 
can solve for (a, zb), it can solve for r = {rx,-- - ,^0+1} 
from the a + 1 number of equations in (a, zb), after canceling 
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out the secure symbols u = {si,--- ,s a -i}. This shows 
that H(r\u,e) = 0. This, together with H(r) = a + 1 and 
Lemma[3] we obtain that I(u; e) = I(s±, • ■ • , s Q -i; dg 2 ) = 0. 

Proposition 8. The secrecy capacity at MSCR point for a file 
size of M = k(d - k + t) is given by M s = a, if (h,£ 2 ) = 
(1,0) andk = t = 2; and by M s = a-l, if(£i,£ 2 ) = (0, 1) 
and k = t = 2. 

C. Code construction for secure MSCR when d = k 

The above construction is limited to the k = 2 case. Here, 
we provide secure MSCR code when d — k, and hence 
allowing k > 2. (Note that as d > k > £\ + £2, we necessarily 
have l\ + £ 2 < d = k here.) Again, we apply the two-stage 
encoding, with using an MRD code as the secrecy pre-coding. 

Consider Ai = k(d - k + t) = kt, f) = j3' = 1, 
a = d-k+t = t, M s = kt-(£ 1 +£ 2 )t-£ 2 (k-£ 1 -£ 2 ), and 

n > d+t. We encode the data using the linearized polynomial 

M-i i 

f(d) = X) u i9 q ■ (This is the Gabidulin construction of 

1=0 

MRD codes H4I summarized in Section IIH-BI ) The coeffi- 
cients of f(g) is chosen by Ai 3 data symbols denoted by u 
and Ai— Ai 3 random symbols denoted by r. The function f(g) 
is evaluated at Ai points in F g ™ {g 1 , . . . , gM } that are linearly 
independent over ¥ q . (Here, the data and random symbols 
belong to ¥ q m with m > Ai.) We denote these points as 
x i = f{ a i) f° r i = 1) • ' • > M = kt. We consider the code 
provided in ifTTll in the secrecy setting here. We place Ai = kt 
symbols into vectors mi,-- - ,m t , each having k symbols. 
We encode these vectors with a Vandermonde matrix of size 
k x n, whose columns are represented as g^ for i = 1, • • • , n. 
We store mjg; at node i. Data collector, by contacting any k 
nodes, can obtain k equations for each of m^ , and solve them 
to obtain x% for i = 1, • • • , Ai = kt. It can then obtain the 
secure data symbols by interpolation. For node repair, consider 
that node j G [t] contacts d = k live nodes, named as j\ to jk- 
It will download mjgj [ from live node ji for I = 1, ■ • • , k. 
Node j then will obtain mj by solving these k equations, and 
send rrijgj/ to j' ^ j, j' G [t], the remaining nodes under 
repair. Each node j G [t] will repeat this procedure. (Then, 
node j will also recover its nij'gj by downloading a symbol 
from each of the node being under repair.) 

We here show the secrecy constraint has met assuming 
l 2 < t. (Otherwise, this construction can not achieve a positive 
secure file size as the E 2 eavesdroppers can obtain all rri[ 1:t ] 
symbols from their downloads.) We observe that £ 2 nodes 
being under repair obtain l 2 k equations from the live nodes 
(these will reveal l 2 number of m^s), and store additional 
£ 2 {t — £2) = £2(0^ — £2) symbols received from the remaining 
nodes under repair. £\ nodes observe lya. number of symbols. 
However, £\£ 2 of these symbols are linearly dependent to 
the ones downloaded by the £ 2 nodes (as £2 nodes have 
the knowledge of £ 2 number of m^s). Therefore, using the 
given polynomial and the secure data of length Ai s , the 
eavesdropper can solve for the random symbols using these 
£ 2 (k + a-£ 2 )+h(a-£ 2 )=£ 2 (k + t-£ 2 ) + £ 1 (t-£ 2 ) = 
(k-h- £ 2 )£2 + (h + £ 2 )t = Ai - Ai s linearly independent 
evaluations of the polynomial. Thus, we have H(r\u,e) = 0, 



where e denotes the observations of £\ and £ 2 eavesdroppers. 
This construction also satisfies H(e) = £ 2 k + (a — £2)^2 + 
l\(a — £2) = H(r) as argued above, and it follows from 
Lemma [3] that we have I(u; e) = 0. This code achieves the 
secure file size of kt — (£\ +£ 2 )t — £2(k — li —£2) when l 2 < t. 

Proposition 9. The secure file size of Ai s = {k — £\ —£ 2 )[t — 
£ 2 } + is achievable at the MSCR point for a file size of Ai = 
k{d — k + t) when d = k. 

Note that this achieves the secrecy capacity when l 2 < 1 
for any li as can be observed from the bound given by 
Proposition [7] 

V. Conclusion 

DSS store data in multiple nodes. These systems not 
only require resilience against node failures, but also have 
to satisfy security constraints and to perform multiple node 
repair. Regenerating codes proposed for DSS address the node 
failure resilience while efficiently trading off storage vs. repair 
bandwidth. In this paper, we considered secure cooperative 
regenerating codes for DSS. The eavesdropper model ana- 
lyzed in this paper belongs to the class of passive attack 
models, where the eavesdroppers observe the content of the 
nodes in the system. Accordingly, we considered an (^1,^2)- 
eavesdropper, where the storage content of any £\ nodes, 
and the download content of any l 2 nodes are leaked to the 
eavesdropper. With such an eavesdropper model, we studied 
the security for the multiple repair scenario, in particular 
secure cooperative regenerating codes. For the minimum band- 
width cooperative regenerating (MBCR) point, we established 
a bound on the secrecy capacity, and by modifying the existing 
coding schemes in the literature, devised new codes achieving 
the secrecy capacity. For the minimum storage cooperative 
regenerating (MSCR) point, on the other hand, we proposed 
an upper bound and lower bounds on the secure file size, which 
match under special cases. The results show that it is possible 
to design regenerating codes that not only efficiently trades 
storage vs. repair bandwidth, but also resilient against security 
attacks in a cooperative repair scenario. Finally, as evident 
from some of our secrecy-achieving constructions, we would 
like to emphasize the role that the maximum rank distance 
(MRD) codes can take in secrecy problems. In particular, we 
have utilized the Gabidulin construction l44l of MRD codes 
and properties of linearized polynomials in obtaining some 
of the results. Similar properties of such codes have been 
utilized to achieve secrecy in earlier works f50l - ll53l . and they 
proved their potential again here as an essential component for 
achieving secrecy in DSS. 

We list some avenues for further research here. The secrecy 
capacity of MSCR codes remain as an open problem, as we 
have established the optimal codes under some parameter 
settings. To attempt this problem, codes for MSCR without 
security constraints have to be further investigated. One can 
also consider cooperative repair in a DSS having locally 
repairable structure. As other distributed systems, DSS may 
exhibit simultaneous node failures that need to be recovered 
with local connections. According to our best knowledge, 
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this setting has not been studied (even without security con- 
straints). Our ongoing efforts are on the design of coding 
schemes for DSS satisfying these properties. 

Appendix A 
Proof of Lemma[3] 

Proof: The proof follows from the classical techniques 
given by 11251 . where instead of 0-leakage, e-leakage rate is 
considered. (The application of this technique in DSS is also 
considered in [27 1.) We have 



J(u;e) 



= H(e)-H(e\u) 

(a) 

< H(e)-H(e\u)+H(e\u,T) 

(b) 

< H(t) -J(e;r|u) 

(c) 



ff(r|u,e) 



a 



(30) 
(31) 
(32) 
(33) 
(34) 



where (a) follows by non-negativity of H(e\u, r), (b) is the 
condition H(e) < H(r), (c) is due to H(r\u) — H(r) as r 
and u are independent, (d) is the condition H(r\u, e) = 0. 

Remark 10. If the eavesdropper has a vanishing probability of 
error in decoding r given e and u, then, by Fano 's inequality, 
one can write H(r\u, e) < |r|e, and, by following the above 
steps, can show the bound /(u;e) < |r|e, where |r| is the 
number of random bits, and e can be made small if the 
probability of error is vanishing. This shows that the leakage 
rate J(u;e)/|e| is vanishing. (See, e.g., H25V . ) 



Appendix B 
Proof or Proposition!!] 

Proof: (flTT i evaluates to the following bound at the MBCR 
point for a file size of Ai = k(2d — k + t). 

M s < M s = k(2d- k + t) -e 1 (2d-£i+ t) 

= {k-h){2d + t-k-h) (35) 

We compare this with the bounds ( fT2l and (TT~4-b . 
■ If t > k: ( TTZb evaluates to the following bound 



M s < M{ = (k — h)(2d + t-k). 



(36) 



Here, as l\ > 0, M s < M{. Hence, ( f35T > gives a tighter 
bound. 

If t < k and l x < at with a = [k/t\ : Let h = at + b, 
where a = t\ and b £ [0 : t — 1] . The expression of 
S in (O at MBCR point is given by 

S = 2li{d-at) + t 2 a{a + l) 

= l x [2d - 2*i + 26) + (*! -b){£ 1 -b + t) 

= *i(2d-*i + 6 + t) -bih-b + t) 

= e 1 (2d-£ 1 +t) -b(t-b) (37) 



where in (a) we used at = i\ — b. Therefore, (fl4l) 
evaluates to 

M s < M s 2 = k(2d + t-k) -*i(2d-4 + t) 

+ b(t - b). (38) 



As b(t - b) > 0, we obtain that M s < M s 2 - Hence, 
gives a tighter bound. 

If t < k and ly > at with a = [k/t\ : Let l\ = at + b, 
where a = \^i/t\ and b £ [Q,t — 1]. The expression of 
S in (O at MBCR point is given by 



5 = 2£ 1 (d-at) + t 2 a(a+l) + (£ 1 

(a) 



at){t - b) 

£i{2d - 2*! + 26) + (*i - 6)(*i -b + t) 
+ b(t - b) 

= t^d-h+t) -6(6-6) (39) 

where in (a) we used at = i\ — b. Therfore, (fl4l) evaluates 
to 

X s < = k(2d + t-k)-£i(2d-h + t) 

+ 6(6-6). (40) 

As 6(6-6) > due to fc > *i, we obtain that 7W S < M%. 

Hence, d35l l gives a tighter bound. 
Combining the cases above, we see that the upper bound on 
the secure MBCR file size is given by d35] >. ■ 

Appendix C 
NRBW values for MBCR point in DSS 

The parameters of Proposition |4] are given in the following 
tables. l\ = case corresponds to the systems without security 
constraints, t = 1 case corresponds to non-cooperative case. 
Red (green) font highlights cases with greater (respectively, 
smaller) cooperative NRBW (-f/A4 s ) compared that of t = 1. 
We observed that the same trend continues for higher n values. 

TABLE I: NRBW for n = 4, 5, d > k, d + t = n. 
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TABLE II: NRBW for n = 4, 5, d > k, d + 1 < n. 
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